Using a Rich Feature Set for the Identification of German MWEs

نویسندگان

  • Fabienne Cap
  • Marion Weller
  • Ulrich Heid
چکیده

Due to the formal variability and the irregular behaviour of MWEs on different levels of linguistic description, they are a potential source of errors for many NLP applications, e.g. Machine Translation. While most of the known approaches to MWE identification focus on one dimension of irregular behaviour, we present an approach that combines morpho-syntactic features (extracted from dependency parsed text) with semantic opacity features (approximated using word alignments). We trained supervised classifiers with different feature sub-sets and show that the combination of morphosyntactic and semantic opacity features yields best overall results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantically Motivated Hebrew Verb-Noun Multi-Word Expressions Identification

Identification of Multi-Word Expressions (MWEs) lies at the heart of many natural language processing applications. In this research, we deal with a particular type of Hebrew MWEs, VerbNoun MWEs (VN-MWEs), which combine a verb and a noun with or without other words. Most prior work on MWEs classification focused on linguistic and statistical information. In this paper, we claim that it is essen...

متن کامل

A Repository of Variation Patterns for Multiword Expressions

One of the crucial issues in the analysis and processing of MWEs is their internal variability. Indeed, the feature that mostly characterises MWEs is their fixedness at some level of linguistic analysis, be it morphology, syntax, or semantics. The morphological aspect is not trivial in languages which exhibit a rich morphology, such as Romance languages. The issue is relevant in at least three ...

متن کامل

A General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram

Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...

متن کامل

Extraction of German Multiword Expressions from Parsed Corpora Using Context Features

We report about tools for the extraction of German multiword expressions (MWEs) from text corpora; we extract word pairs, but also longer MWEs of different patterns, e.g. verb-noun structures with an additional prepositional phrase or adjective. Next to standard association-based extraction, we focus on morpho-syntactic, syntactic and lexical-choice features of the MWE candidates. A broad range...

متن کامل

A Lexical Resource of Hebrew Verb-Noun Multi-Word Expressions

A verb-noun Multi-Word Expression (MWE) is a combination of a verb and a noun with or without other words, in which the combination has a meaning different from the meaning of the words considered separately. In this paper, we present a new lexical resource of Hebrew Verb-Noun MWEs (VN-MWEs). The VN-MWEs of this resource were manually collected and annotated from five different web resources. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013